.\"                                                               -*-nroff-*-
.\" Copyright (c) 2018-2021  Joachim Wiberg <troglobit@gmail.com>
.\"
.\" Permission to use, copy, modify, and/or distribute this software for any
.\" purpose with or without fee is hereby granted, provided that the above
.\" copyright notice and this permission notice appear in all copies.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
.\"
.Dd Apr 19, 2021
.Dt WATCHDOGD 5 SMM
.Os
.Sh NAME
.Nm watchdogd.conf
.Nd watchdogd configuration file
.Sh DESCRIPTION
The default
.Xr watchdogd 8
use-case does not require a configuration file.  However, enabling a
health monitor plugin or the process supervisor is done using
.Pa /etc/watchdogd.conf .
.Pp
Available monitor plugins are:
.Bl -tag -width TERM
.It Cm supervisor
Process supervisor, monitor the heartbeat of processes
.It Cm filenr
File descriptor monitor, also covers sockets, and other descriptor based
resources
.It Cm loadavg
CPU load average monitor
.It Cm meminfo
Memory usage monitor
.El
.Sh SYNTAX
This file is a standard UNIX .conf file with sub-sections and '=' for
assignment.  The '#' character marks start of a comment to end of line,
and the '\\' character can be used as an escape character.  Whitesapce
is ignored, unless inside a string.
.Pp
.Sy Warning:
do not set the below WDT timeout and kick interval too low.  The daemon
(usually) runs as a regular
.Ql SCHED_OTHER
background task and the monitor plugins (as well as your other services)
need CPU time as well.
.Pp
.Bl -tag -width TERM
.It Cm timeout = Ar SEC
The WDT timeout before reset.  Default: 20 sec.
.It Cm interval = Ar SEC
The kick interval, i.e. how often
.Xr watchdogd 8
should reset the WDT timer.  Default: 10 sec
.It Cm safe-exit = Ar true | false
With safe-exit enabled (true) the daemon will ask the driver disable the
WDT before exiting (SIGINT).  However, some WDT drivers (or HW) may not
support this.  Default: disabled
.It Cm script = Ar "/path/to/reboot-action.sh"
Script or command to run instead of reboot when a monitor plugin reaches
any of its critical or warning level.  Setting this will disable the
default reboot action on critical, it is therefore up to the script to
perform reboot, if needed.  The script is called as:
.Bd -unfilled -offset indent
script.sh {filenr, loadavg, meminfo} {crit, warn} VALUE
.Ed
.Pp
Health monitor plugins also have their own local script setting.
.It Cm reset-reason Ar {}
This section controls the reset reason, including the reset counter.  By
default this is disabled, since not all systems allow writing to disk,
e.g. embedded systems using MTD devices with limited number of write
cycles.
.Bl -tag -width TERM
.It Cm enabled = Ar true | false
Enable or disable storing reset cause, default: disabled
.It Cm file = Ar "/var/lib/watchdogd.state"
The default file setting is a non-volatile path, according to the FHS.
It can be changed to another location, but make sure that location is
writable first.
.El
.Pp
.Sy Note:
This section was previously called
.Cm reset-cause ,
which is deprecated and may be removed in a future release.
.El
.Ss Process Supervisor
.Bl -tag -width TERM
.It Cm supervisor Ar {}
Instrumented processes can have their main loop supervised.  Processes
subscribe to this service using the libwdog API, see the docs for more
on this.  When enabled
.Nm watchdogd
switches to
.Ql SCHED_RR
with elevated realtime priority.  When disabled it runs as a regular
.Ql SCHED_OTHER
process.
.Pp
.Bl -tag -width TERM
.It Cm enabled = Ar true | false
Enable or disable supervisor, default: disabled
.It Cm priority = Ar NUM
The realtime priority.  Default: 98
.It Cm script = Ar "/path/to/script.sh"
When a supervised process fails to meet its deadline the supervisor by
default performs an unconditional reset, saving the reset cause first.
However, if a script is provided in this section it will be called
instead:
.Bd -unfilled -offset indent
script.sh supervisor CAUSE PID LABEL
.Ed
.Pp
The
.Ar CAUSE
value is documented in
.Xr watchdogctl 1 .
.Pp
The
.Ar LABEL
can be any free form string the supervised process used when registering
with the supervisor, hence it is given as the last argument to the
script.
.Pp
The return value of the script determines how the system continues to
operate: POSIX OK (0) means the script has handled the situation in some
manner and
.Nm watchdogd
stops supervising the offending process, a non-zero return value from
script means the script has either failed to handle the situation or
prefers to delegate to
.Nm watchdogd
to save the reset cause and perform the actual system reset.
.Pp
The global script setting does not apply to this section.  However, the
same script can be used, due to the unique first argument.
.Pp
.Cm IMPORTANT:
Calling
.Xr watchdogctl 1
from the script with the
.Ar fail
command will cause an infinite loop.  It is strongly advised to return
non-zero from the script instead.
.El
.El
.Ss File Descriptor Monitor
.Bl -tag -width TERM
.It Cm filenr Ar {}
Monitors file descriptor leaks based on
.Ql /proc/sys/fs/file-nr .
.Bl -tag -width TERM
.It Cm enabled = Ar true | false
Enable or disable plugin, default: disabled
.It Cm interval = Ar SEC
Poll interval, default: 300 sec
.It Cm logmark = Ar true | false
Log current stats every poll interval.  Default: disabled
.It Cm warning = Ar LEVEL
High watermark level, alert sent to log.
.It Cm critical = Ar LEVEL
Critical watermark level, alert sent to log, followed by reboot or
script action.
.It Cm script = Ar "/path/to/reboot-action.sh"
Optional script to run instead of reboot if critical watermark level is
reached.  If omitted the global
.Ql script
action is used.  The scripts are called in the same way as the global
script, same arguments.
.El
.El
.Ss CPU Load Average Monitor
.Bl -tag -width TERM
.It Cm loadavg Ar {}
Monitors load average based on
.Xr sysinfo 2
from
.Ql /proc/loadavg .
The trigger level for warning and critical watermarks is composed from
the average of the 1 and 5 min marks.
.Pp
.Sy Note:
load average is a blunt instrument and highly use-case dependent.  Peak
loads of 16.00 on an 8 core system may be responsive and still useful
but 2.00 on a 2 core system may be completely bogged down.  Read up on
the subject and test your system before enabling the critical level.
.Bl -tag -width TERM
.It Cm enabled = Ar true | false
Enable or disable plugin, default: disabled
.It Cm interval = Ar SEC
Poll interval, default: 300 sec
.It Cm logmark = Ar true | false
Log current stats every poll interval.  Default: disabled
.It Cm warning = Ar LEVEL
High watermark level, alert sent to log.
.It Cm critical = Ar LEVEL
Critical watermark level, alert sent to log, followed by reboot or
script action.
.It Cm script = Ar "/path/to/reboot-action.sh"
Optional script to run instead of reboot if critical watermark level is
reached.  If omitted the global
.Ql script
action is used.  The scripts are called in the same way as the global
script, same arguments.
.El
.El
.Ss Memory Usage Monitor
.Bl -tag -width TERM
.It Cm meminfo Ar {}
Monitors free RAM based on data from
.Ql /proc/meminfo .
.Bl -tag -width TERM
.It Cm enabled = Ar true | false
Enable or disable plugin, default: disabled
.It Cm interval = Ar SEC
Poll interval, default: 300 sec
.It Cm logmark = Ar true | false
Log current stats every poll interval.  Default: disabled
.It Cm warning = Ar LEVEL
High watermark level, alert sent to log.
.It Cm critical = Ar LEVEL
Critical watermark level, alert sent to log, followed by reboot or
script action.
.It Cm script = Ar "/path/to/reboot-action.sh"
Optional script to run instead of reboot if critical watermark level is
reached.  If omitted the global
.Ql script
action is used.  The scripts are called in the same way as the global
script, same arguments.
.El
.El
.Ss Generic Script Monitor
.Bl -tag -width TERM
.It Cm generic Ar {}
Monitor a generic script.  Trigger warning and critical actions based on
the exit code of the script.
.Bl -tag -width TERM
.It Cm enabled = Ar true | false
Enable or disable plugin, default: disabled
.It Cm interval = Ar SEC
How often to run the
.Cm monitor-script ,
default: 300 sec
.It Cm timeout = Ar SEC
Maximum amount of seconds
.Cm monitor-script
is allowed to run, default: 300 sec
.It Cm warning = Ar VAL
High watermark level, alert sent to log if exit status from
.Cm monitor-script
is greater or equal to this value.
.It Cm critical = Ar VAL
Critical watermark level, alert sent to log, followed by reboot or
.Cm script
action if
.Cm monitor-script
exit status is greater or equal to this value.
.It Cm monitor-script = Ar "/path/to/generic-script.sh"
Monitor script to run every
.Cm interval
seconds.
.It Cm script = Ar "/path/to/reboot-action.sh"
Optional script to run instead of reboot if critical watermark level is
reached.  If omitted the global
.Ql script
action is used.  The scripts are called in the same way as the global
script, same arguments.
.El
.El
.Sh EXAMPLE
.Bd -unfilled -offset indent
### /etc/watchdogd.conf
timeout   = 20
interval  = 10
safe-exit = false

supervisor {
    enabled  = true
    priority = 98
}

reset-reason {
    enabled = true
    file    = "/var/lib/watchdogd.state"
}

### Checkers/Monitors ##################################################
#
# Script or command to run instead of reboot when a monitor plugin
# reaches any of its critical or warning level.  Setting this will
# disable the built-in reboot on critical, it is therefore up to the
# script to perform reboot, if needed.  The script is called as:
#
#    script.sh {filenr, loadavg, meminfo} {crit, warn} VALUE
#
#script = "/path/to/reboot-action.sh"

# Monitors file descriptor leaks based on /proc/sys/fs/file-nr
filenr {
    enabled  = true
    interval = 300
    logmark  = false
    warning  = 0.9
    critical = 0.95
#    script = "/path/to/alt-reboot-action.sh"
}

# Monitors load average based on sysinfo() from /proc/loadavg
# The level is composed from the average of the 1 and 5 min marks.
loadavg {
    enabled  = true
    interval = 300
    logmark  = false
    warning  = 1.0
    critical = 2.0
#    script = "/path/to/alt-reboot-action.sh"
}

# Monitors free RAM based on data from /proc/meminfo
meminfo {
    enabled  = true
    interval = 300
    logmark  = false
    warning  = 0.9
    critical = 0.95
#    script = "/path/to/alt-reboot-action.sh"
}

# Generic site-specific script
generic {
    enabled  = true
    interval = 60
    timeout = 10
    warning = 10
    critical = 100
    monitor-script = "/path/to/monitor-script.sh"
#    script = "/path/to/alt-reboot-action.sh"
}
.Ed
.Sh SEE ALSO
.Xr watchdogd 8
.Xr watchdoctl 1
.Sh AUTHORS
.Nm
is an improved version of the original, created by Michele d'Amico and
adapted to uClinux-dist by Mike Frysinger.  It is maintained by Joachim
Wiberg at
.Lk https://github.com/troglobit/watchdogd "GitHub" .
